Search Results/Filters    

Filters

Year

Banks




Expert Group











Full-Text


Author(s): 

SUN Y. | TODOROVIC S.

Issue Info: 
  • Year: 

    2010
  • Volume: 

    32
  • Issue: 

    9
  • Pages: 

    1610-1626
Measures: 
  • Citations: 

    1
  • Views: 

    123
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 123

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2023
  • Volume: 

    20
  • Issue: 

    1
  • Pages: 

    39-58
Measures: 
  • Citations: 

    0
  • Views: 

    91
  • Downloads: 

    16
Abstract: 

Due to the increasing information and the detailed analysis of them, the clustering problems that detect the hidden patterns lie in the data are still of great importance. On the other hand, clustering of High-dimensional data using previous traditional methods has many limitations. In this study, a semi-supervised ensemble clustering method is proposed for a set of High-dimensional medical data. In the proposed method of this study, little information is available as prior knowledge using the information on similarity or dissimilarity (as a number of pairwise constraints). Initially using the transitive property, we generalize the pairwise constraints to all data. Then we divide the feature space into a number of sub-spaces, and to find the optimal clustering solution, the feature space is divided into an unequal number of sub-spaces randomly. A semi-supervised spectral clustering based on the p-Laplacian graph is performed at each sub-space independently. Specifically, to increase the accuracy of spectral clustering, we have used the spectral clustering method based on the p-Laplacian graph. The p-Laplacian graph is a nonlinear generalization of the Laplacian graph. The results of any clustering solutions are compared with the pairwise constraints and according to the level of matching, a degree of confidence is assigned to each clustering solution. Based on these degrees of confidence, an ensemble adjacency matrix is formed, which is the result of considering the results of all clustering solutions for each sub-space. This ensemble adjacency matrix is used in the final spectral clustering algorithm to find the clustering solution of the whole sub-space. Since the sub-spaces are generated randomly with an unequal number of features, clustering results are strongly influenced by different initial values. Therefore, it is necessary to find the optimal sub-space set. To this end, a search algorithm is designed to find the optimal sub-space set. The search process is initialized by forming several sets (we call each set an environment) consisting of several numbers of sub-spaces. An optimal environment is the one that has the best clustering results. The search algorithm utilized three search operators to find the optimal environment. The search operators search all the environments and the consequent sub-spaces both locally and globally. These operators combine two environments and/or replace an environment with a newly generated one. Each search operator tries to find the best possible environment in the entire search space or in a local space. We evaluate the performance of our proposed clustering schema on 20 cancer gene datasets. The normalized mutual information (NMI) criterion and the adjusted rand index (ARI) are used to evaluate the performance evaluation. We first examine the effect of a different number of pairwise constraints. As expected, with increasing the number of pairwise constraints, the efficiency of the proposed method also increases. For example, the NMI value increases from 0. 6 to 0. 9 on the Khan-2001 dataset, when the number of pairwise constraints increases from 20 to 100. More number of pairwise constraints means more information is available, which helps to improve the performance of the clustering algorithm. Furthermore, we examine the effect of the number of random subspaces. It is observed that increasing the number of random subspaces has a positive effect on clustering performance with respect to the NMI value. In most datasets, when the number of sub-spaces reaches 20, the performance of the proposed method does not change much and is stable. Examining the effect of sampling rate for random subspace generation shows that the proposed method has the best performance in most cancer datasets, such as Armstrong-2002-v3, and Bredel-2005 datasets, when the random subspace generation rate is 0. 5, and by deviating the rate from 0. 5, the level of satisfaction decreases. Then, the results of the proposed idea are compared with the results of the method proposed in the reference [21] according to ARI and we see that our proposed method has performed better in 12 data sets out of 20 data sets than the method proposed in the reference [21]. Finally, the proposed idea is compared with some metric learning approaches with respect to NMI. We have observed that the proposed method obtained the best results compared to other compared methods on 11 datasets out of 20 datasets. It also achieved the second-best result on 6 out of 20 datasets. For example, the value NMI obtained in the proposed method is 0. 1042 more than the reference [21] and it is 0. 1846 more than RCA and it is 0. 4 more than ITML and also it is 0. 468 more than DCA on the Bredel-2005 dataset. Utilizing ensemble clustering methods besides the confidence factor improves the ability of the proposed algorithm to achieve better results. Also, utilizing the transitive operators as well as the selection of random subspaces of unequal sizes play an important role in achieving better performance for the proposed algorithm. Using the p-Laplacian spectral clustering method produces a better, more balanced, and normal volume of clusters compared to the standard spectral clustering. Another effective approach to the performance of the proposed method is to use search operators to find the best subspace, which leads to better results.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 91

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 16 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2025
  • Volume: 

    57
  • Issue: 

    1
  • Pages: 

    80-95
Measures: 
  • Citations: 

    0
  • Views: 

    3
  • Downloads: 

    0
Abstract: 

Feature selection is a critical step in machine learning, especially when dealing with High-dimensional and incomplete data. Traditional methods often struggle with missing values, which are common in real-world applications. This paper introduces Neural Network Feature Selection (NNFS), a novel deep learning-based approach that effectively identifies important features even in the presence of missing data. We provide a variety of comparisons to evaluate the suggested algorithm over existing methods. We demonstrate the accuracy, speed and sensitivity to missed data. According to numerical results, the proposed algorithm outperforms existing methods especially for medium size datasets. Both random and real tests are presented to make the results more realistic.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 3

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    16
  • Issue: 

    1
  • Pages: 

    239-252
Measures: 
  • Citations: 

    0
  • Views: 

    120
  • Downloads: 

    0
Abstract: 

Introduction The clustering of the High dimensional data is usually encountered with problems such as the curse of dimensionality. To overcome such obstacles, dimensionality reduction methods are often used. This view is typically referred to by two approaches,variable selection and variable extraction. Recently, researchers proposed a way that is claimed to lose less information in clustering High-dimensional data than other techniques. Among them, that presented by Anderlucci et al. (2021) under the title of Random Projections is very popular. The RP method is based on creating random projections, selecting a small subset, and then performing clustering tasks. Comparison and superiority of this method with conventional approaches of dimensionality reduction, using four critical criteria of clustering including adjusted Rand index, Jaccard index, Fowlkes-Malo index and the accuracy index is performed on three gene expression datasets in this article. Material and Methods One of the variable selection methods is the variable selection approach for clustering based on the Gaussian model. On the other hand, the principal components analysis method is one of the most popular methods for extracting variables. Another practical, new and exciting approach to performing dimensionality reduction is the Random Projections method. Using a group random projections, Andrelucci et al. (٢, ٠, ٢, ١, ) proposed clustering algorithm to cluster the High-dimensional data. This algorithm obtains the final output through Gaussian mixture model clustering applied to the optimal subset of random projections. Then, the original High-dimensional data is mapped onto the reduced spaces. Finally, model selection criteria are calculated for them and observations are clustered using optimal projections. Results and Discussion In this paper, the proposed methods by Anderlucci et al. (2021) are described and compared on three gene expression datasets, including leukaemia, lymphoma, and prostate cancers. Based on the gained results, using the introduced criteria, both competing methods have lower values than the random projections method and therefore have weaker performance. The final result is that the random projections method performs better for the three mentioned datasets. It should be noted that the purpose of the current study was only to compare the performance of clustering based on the three mentioned approaches and some different clustering criteria. So, other analytical aspects related to the random projection were not considered. Further exploration of these methods will be followed in our future research. Conclusion Clustering of High-dimensional data faces different statistical challenges, and various methods exist to overcome the related problems. One of these practical tools is reducing the data dimension. This article examined the random projection from both theoretical and practical aspects. Also, its performance was evaluated on three real data sets and compared with other standard methods, and its superiority was shown based on several conventional indicators of clustering measures. To conduct future research, one can address the probabilistic aspects of the random projections approach by considering proper statistical inference methods.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 120

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Najarzadeh D.

Issue Info: 
  • Year: 

    2023
  • Volume: 

    17
  • Issue: 

    1
  • Pages: 

    201-218
Measures: 
  • Citations: 

    0
  • Views: 

    159
  • Downloads: 

    76
Abstract: 

In multiple regression analysis, the population multiple correlation coefficient (PMCC) is widely used to measure the correlation between a variable and a set of variables. To evaluate the existence or non-existence of this type of correlation, testing the hypothesis of zero PMCC can be very useful. In High-dimensional data, due to the singularity of the sample covariance matrix, traditional testing procedures to test this hypothesis lose their applicability. A simple test statistic was proposed for zero PMCC based on a plug-in estimator of the sample covariance matrix inverse. Then, a permutation test was constructed based on the proposed test statistic to test the null hypothesis. A simulation study was carried out to evaluate the performance of the proposed test in both High-dimensional and low-dimensional normal data sets. This study was finally ended by applying the proposed approach to mice tumour volumes data.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 159

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 76 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    621
  • Volume: 

    22
  • Issue: 

    2
  • Pages: 

    129-146
Measures: 
  • Citations: 

    0
  • Views: 

    5
  • Downloads: 

    0
Abstract: 

Graph neural networks and fuzzy models offer effective and practical methods for solving various tasks at the large-scale graph level. Large-scale graph embedding based on deep methods and fuzzy models is categorized into fusion and integration. Feature extraction and graph structure at the local and global levels are based on augmented graph fusion. In fusion-based graph embedding, the fuzzy model is used as an activation function based on an aggregated process. In some cases, the fusion of graph neural network methods with fuzzy systems has been successful. However, no effective methods have been developed for integrating fuzzy models with deep methods. Two main issues are associated with this integration: (1) computational complexity due to the exponential increase in fuzzy rules with the number of features, and (2) the complexity of the solution space due to the combination of fuzzy regression rules between inputs and outputs. Additionally, modeling at the large-scale graph level using linear regression and graph neural networks is not sufficient. Therefore, this paper proposes a feature and structure combination method at the local and global levels using a combination of fuzzy modeling and graph transformers, an integrated deep learning technique called Fuzzy Graph Transformer (FuzzyGT). We conducted experiments on deep learning graph datasets to compare with the proposed model. Our method achieved the best results compared to other advanced models

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 5

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

ROUHI A. | NEZAMABADI POUR H.

Issue Info: 
  • Year: 

    2018
  • Volume: 

    15
  • Issue: 

    4
  • Pages: 

    283-294
Measures: 
  • Citations: 

    0
  • Views: 

    1955
  • Downloads: 

    0
Abstract: 

Nowadays, with the advent and proliferation of High-dimensional data, the process of feature selection plays an important role in the domain of machine learning and more specifically in the classification task. Dealing with High-dimensional data, e.g. microarrays, is associated with problems such as increased presence of redundant and irrelevant features, which leads to decreased classification accuracy, increased computational cost, and the curse of dimensionality. In this paper, a hybrid method using ensemble methods for feature selection of High dimensional data, is proposed. In the proposed method, in the first stage, a filter method reduces the dimensionality of features and then, in the second stage, two state-of-the-art wrapper methods run on the subset of reduced features using the ensemble technique. The proposed method is benchmarked using 8 microarray datasets. The comparison results with several state-of-the-art feature selection methods confirm the effectiveness of the proposed approach.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 1955

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 5
Issue Info: 
  • Year: 

    2022
  • Volume: 

    19
  • Issue: 

    4
  • Pages: 

    302-312
Measures: 
  • Citations: 

    0
  • Views: 

    239
  • Downloads: 

    0
Abstract: 

One of the challenges of High dimensional outlier detection problem is the curse of dimensionality which irrelevant dimensions (features) lead to hidden outliers. To solve this problem, some dimensions that contain valuable information to detect outliers are searched to make outliers more prominent and detectable by mapping the dataset into the subspace which is constituted of these relevant dimensions/features. This paper proposes an outlier detection method in High dimensional data by introducing a new locally relevant subspace selection and developing a local density-based outlier scoring. First, we present a locally relevant subspace selection method based on local entropy to select a relevant subspace for each data point due to its neighbors. Then, each data point is scored in its relevant subspace using a density-based local outlier scoring method. Our adaptive-bandwidth kernel density estimation method eliminates the slight difference between the density of a normal data point and its neighbors. Thus, normal data are not wrongly detected as outliers. At the same time, our method underestimates the actual density of outlier data points to make them more prominent. The experimental results on several real datasets show that our local entropy-based subspace selection algorithm and the proposed outlier scoring can achieve a High accuracy detection rate for the outlier data.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 239

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Author(s): 

Issue Info: 
  • Year: 

    2023
  • Volume: 

    27
  • Issue: 

    6
  • Pages: 

    1896-1911
Measures: 
  • Citations: 

    1
  • Views: 

    15
  • Downloads: 

    0
Keywords: 
Abstract: 

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 15

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2019
  • Volume: 

    7
  • Issue: 

    4 (Special Issue)
  • Pages: 

    626-634
Measures: 
  • Citations: 

    0
  • Views: 

    201
  • Downloads: 

    85
Abstract: 

Traditional logistic regression is plugged with degenerates and violent behavior in High-dimensional classification, because of the problem of non-invertible matrices in estimating model parameters. In this paper, to overcome the High-dimensionality of data, we introduce two new algorithms. First, we improve the efficiency of finite population Bayesian bootstrapping logistic regression classifier by using the rule of majority vote. Second, using simple random sampling without replacement to select a smaller number of covariates rather than the sample size and applying traditional logistic regression, we introduce the other new algorithm for High-dimensional binary classi cation. We compare the proposed algorithms with the regularized logistic regression models and two other classification algorithms, i. e., naive Bayes and K-nearest neighbors using both simulated and real data.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 201

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 85 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button